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Abstract. These notes discuss the quantum algorithms we know of that can 
solve problems significantly faster than the corresponding classical algorithms. 
So far, we have only discovered a few techniques which can produce speed 
up versus classical algorithms. It is not clear yet whether the reason for this 
is that we do not have enough intuition to discover more techniques, or that 
there are only a few problems for which quantum computers can significantly 
speed up the solution. 

In the first section of these notes, I try to explain why the recent results 
about quantum computing have been so surprising. This section comes from 
a talk I have been giving for several years now, and discusses the history 
of quantum computing and its relation to the mathematical foundations of 
computer science. In Sections ^ and |i| I talk about the quantum computing 
model and its relationship to physics. These sections rely heavily on two of 
my papers [SIAM J. Comp. 26 (1992), 1484-1509; Doc. Math. Extra Vol. 
ICM I (1998), 467-486]. Sections ^ and Q illustrate the general technique 
of using quantum Fourier transforms to find periodicity. Section ^| contains 
an algorithm of Dan Simon showing that quantum computers are likely to be 
exponentially faster than classical computers for some problems. Section |] 
discusses my factoring algorithm, which was inspired in part by Dan Simon's 
paper. In the final section, I discuss Lov Grover's search algorithm, which 
illustrates a different technique for speeding up classical algorithms. These 
techniques for constructing faster algorithms for classical problems on quantum 
computers are the only two significant ones which have been discovered so far. 



1. History and Foundations 

The first results in the mathematical theory of theoretical computer science 
appeared before the discipline of computer science existed; in fact, even before elec- 
tronic computers existed. Shortly after Godel proved his famous incompleteness 
result, several papers (k], |27], were published that drew a distinction be- 

tween computable and non-computable functions. These papers showed that there 
are some mathematically defined functions which are impossible to compute algo- 
rithmically. Of course, proving such a theorem requires a mathematical definition 
of what it means to compute a function. These papers contained several distinct 
definitions of computation. What was observed was that, despite the fact that these 
definitions appear quite different, they all result in the same class of computable 
functions. This led to the proposal of what is now called the Church- Turing thesis, 
after two of its proponents. This thesis says that any function that is computable by 
any means, can also be computed by a Turing machine. This is not a mathematical 
theorem, because it does not give a mathematical precise definition of computable; 
it is rather a statement about the real world. In fact, many such mathematical 
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theorems have been proven for various definitions of computation. What was not 
widely appreciated until recently is that, since the Church- Turing thesis implicitly 
refers to the physical world, it is in fact a statement about physics. In the sixty 
years since Church proposed his thesis, nobody has discovered any counterexam- 
ples to it and it is now widely accepted. The current theories of physics appear 
to support this thesis, although as we do not yet have a comprehensive theory of 
physical laws, we must wait until we make a final judgment on this thesis. 

The model that the majority of these early papers used for intuition about com- 
putation does not appear to have been a digital computer, as these did not yet exist. 
Rather, they appear to have been inspired by considering a mathematician scrib- 
bling on sheets of paper. Less than a decade after 1936, the first digital computers 
were built. As the Church- Turing thesis asserts, the class of functions computable 
by digital machines with arbitrarily large amounts of time and memory is indeed 
those functions computable by a Turing machine. 

With the advent of practical digital computers, it became clear that the distinc- 
tion between computable and non-computable was much too course for practical 
use, as actual computers do not have an arbitrary amount of time and memory. 
After all, it doesn't do much good in practice to know that a function is com- 
putable if the sun will burn out long before any conceivable computer could reach 
the end of the computation. What was needed was some classification of functions 
as efficiently or inefficiently computable, based on their computational difficulty. 
In the late 1960's and early 1970's theoretical computer scientists came up with an 
asymptotic classification that reflects this distinction moderately well in practice, 
and is also tractable to work with theoretically, that is, useful for proving theo- 
rems about the difficulty of computation. Computer scientists call an algorithm 
polynomial-time if the running time grows polynomially in the input size, and they 
say that a problem is in the complexity class P if there is a polynomial-time algo- 
rithm solving it. This does not capture the intuitive notion of efficient perfectly 
— hardly anybody would claim that an algorithm with an n 30 running time is fea- 
sible — but it works reasonably well in practice. Experience seems to show that 
most natural problems in P tend to have reasonably efficient algorithms, and most 
natural problems not in P tend not to be solvable much faster than exponential 
time. Further, the complexity class P has been very useful for proving theorems, an 
advantage which is unlikely to hold for any definition which differentiates between 
0(n 3 ) and 0(n 30 ) algorithms. 

For the definition of P to make sense, you need to know that it does not depend 
on the exact type of computer used for the computation. This led to a "folk" thesis, 
which we call the polynomial Church's thesis, whose origins appear to be impossible 
to pin down, but which has nevertheless been widely referred to in the literature. 
This thesis says that any physically computable function can be computed on a 
Turing machine with at most a polynomial increase in the running time. That is, if 
a function can be computed on a physical computer in time T, it can be computed 
on a Turing machine in time 0(T C ) for some constant c depending only on the class 
of computing machine used. 

Why might this folk thesis be true? One explanation might be that the physical 
laws of our universe are efficiently simulable by computers. This would explain it 
via the following argument: if we have some physical machine that solves a problem, 
then we can simulate the physical laws driving this machine, and by our hypothesis 



INTRODUCTION TO QUANTUM ALGORITHMS 



3 



this simulation runs in polynomial time. Conversely, if we are interested in coun- 
terexamples to the polynomial Church's thesis, we should look at physical systems 
which appear to be very difficult to simulate on a digital computer. Two classes of 
physical systems immediately spring to mind for which simulation currently con- 
sumes vast amounts of computer time, even while trying to solve relatively simple 
problems. One of these is turbulence, about which I unfortunately have nothing 
further to say. The other is quantum mechanics. 

In 1982, Feynman Jl9| argued that simulating quantum mechanics inherently re- 
quired an exponential amount of overhead, so that it must take enormous amounts 
of computer time no matter how clever you are. This realization was come to inde- 
pendently, and somewhat earlier, in 1980, in the Soviet Union by Yuri Manin p0| . 
It is not true that all quantum mechanical systems are difficult to simulate; some 
of them have exact solutions and others have very clever computational shortcuts, 
but it does appear to be true when simulating a generic quantum mechanics sys- 
tem. Another thing Feynman suggested in this paper was the use of quantum 
computers to get around this. That is, a computer based on fundamentally quan- 
tum mechanical phenomena might be used to simulate quantum mechanics much 
more efficiently. In much the same spirit, you could think of a wind tunnel as a 
"turbulence computer" . Benioff || had already showed how quantum mechanical 
processes could be used as the basis of a classical Turing machine. Feynman [ pp| 
refined these ideas in a later paper. 

In 1985, David Deutsch [jl5| gave an abstract model of quantum computation, 
and also raised the question of whether quantum computers might actually be use- 
ful for classical problems. Subsequently, he and a number of other people |L6[ ||. ^9| 
came up with rather contrived-appearing problems for which quantum computers 
seemed to work better than classical computers. It was by studying these algo- 
rithms, especially Dan Simon's |39), that I figured out how to design the factoring 
algorithm. 



2. The Quantum Circuit Model 

In this section we discuss the quantum circuit model [Q for quantum computa- 
tion. This is a rigorous mathematical model for a quantum computer. It is not the 
only mathematical model that has been proposed for quantum computation; there 
are also the quantum Turing machine model ^ and the quantum cellular au- 
tomata model (3^, fl2| . All these models result in the same class of polynomial-time 
quantum computable functions. These are, of course, not the only potential models 
for quantum computation, and some of the assumptions made in these models, such 
as unitarity of all gates, and the lack of fermion/boson particle statistics, clearly are 
not physically realistic in that it is easy to conceive of machines that do not conform 
to the above assumptions. However, there do not seem to be any physically realistic 
models which have more computational power than the ones listed above. Neither 
non-unitarity || nor fermions j|] add significant power to the mathematical model. 
Of these models, the quantum circuit model is possibly the simplest to describe. It 
is also easier to connect with possible physical implementations of quantum com- 
puters than the quantum Turing machine model. The disadvantage of this model is 
that it is not naturally a uniform model. Uniformity is a technical condition arising 
in complexity theory, and to make the quantum circuit model uniform, additional 
constraints must be imposed on it. This issue is discussed later. 
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In analogy with a classical bit, a two-state quantum system is called a qubit, or 
quantum bit. Mathematically, a qubit takes a value in the vector space C 2 . Wc 
single out two orthogonal basis vectors in this space, and label these Vq and V\. 
In Dirac's "bra-ket" notation, which comes from physics and is commonly used in 
the quantum computing field, these are represented as |0) and |1). More precisely, 
quantum states are invariant under multiplication by scalars, so a qubit lives in 
two-dimensional complex projective space. To conform with physics usage, we 
treat qubits as column vectors and operate on them by left multiplication. 

One of the fundamental principles of quantum mechanics is that the joint quan- 
tum state space of two systems is the tensor product of their individual quantum 
state spaces. Thus, the quantum state space of n qubits is the space C 2 . The 
basis vectors of this space are parameterized by binary strings of length n. Wc 
make extensive use of the tensor decomposition of this space into n copies of C 2 , 
where we represent a basis state V b corresponding to the binary string b\b 2 ■ ■ ■ b n 

by 

V bl b 2 -b n = v bl <g> v b2 <g> . . . <g> v bn . 

In "bra-ket" notation, this state is written as I&1&2&3 • • • b n ) or cquivalently, as the 
tensor product |&i)|&2)|&3) ■ ■ ■ \b n )- Generally, we use position to distinguish the n 
different qubits. Occasionally we need some other notation for distinguishing them, 
in which case we denote the i'th qubit by V^. Since quantum states are invariant 
under multiplication by scalars, they can without loss of generality be normalized 
to be unit length vectors; except where otherwise noted, quantum states in this 
paper will be assumed to be normalized. Quantum computation takes place in the 
quantum state space of n qubits C 2 , and obtains extra computational power from 
its exponential dimensionality. 

In a usable computer, we need some means of giving it the problem we want 
solved (input), some means of extracting the answer from it (output), and some 
means of manipulating the state of the computer to transform the input into the 
desired output (computation). We next briefly describe input and output for the 
quantum circuit model. We then take a brief detour to describe the classical circuit 
model; this will motivate the rules for performing the computation on a quantum 
computer. 

Since we are comparing quantum computers to classical computers, and solving 
classical problems on a quantum computer, in this paper the input to a quantum 
computer will always be classical information. It can thus can be expressed as a 
binary string S of some length k. We need to encode this in the initial quantum 
state of the computer, which must be a vector in C 2 . The way we do this is to 
concatenate the bit string S with n — k O's to obtain the length n string SO . . .0. 
We then initialize the quantum computer in the state Vso...o- Note that the number 
of qubits is in general larger than the input. These extra qubits, which we can take 
to be initialized to 0, are often required for workspace in implementing quantum 
algorithms. 

At the end of a computation, the quantum computer is in a state which is a 
unit vector in C 2 . This state can be written explicitly as W = J2 S a sVs where 
s ranges over binary strings of length n, a s G C, and J2 S \ a s\ 2 = 1- These a s 
are called probability amplitudes, and we say that W is a superposition of basis 
vectors V s . In quantum mechanics, the Heisenberg uncertainty principle tells us 
that we cannot measure the complete quantum state of this system. There are a 



INTRODUCTION TO QUANTUM ALGORITHMS 



5 



large number of permissible measurements; for example, any orthogonal basis of 
C 2 defines a measurement whose possible outcomes are the elements of this basis. 
However, we assume that the output is obtained by projecting each qubit onto the 
basis {Vb, Vi}. This measurement has the great advantage of being simple, and it 
appears that any physically reasonable measurements can be accomplished by first 
doing some precomputation and then making the above canonical measurement. 

When applied to a state X) s a s^ s ' * ms projection produces the string s with 
probability \a s \ 2 . The quantum measurement process is inherently probabilistic. 
Thus we do not require that the computation gives the right answer all the time; but 
that we obtain the right answer at least 2/3 of the time. Here, the probability 2/3 
can be replaced by any number strictly between 1/2 and 1 without altering the class 
of functions that can be computed in polynomial time by quantum computers — if 
the probability of obtaining the right answer is strictly larger than 1/2, it can be 
amplified by running the computation several times and taking the majority vote 
of the results of these separate computations. 

In order to motivate the rules for state manipulation in a quantum circuit, we now 
take a brief detour and describe the classical circuit model. Recall that a classical 
circuit can always be written solely with the three gates AND (A), OR (V) and 
NOT (-■). These three gates are thus said to form a universal set of gates. Besides 
these three gates, note that we also need elements which duplicate the values on 
wires. It is arguable that these elements should also be classified as gates. These 
duplicating "gates" are not possible in the domain of quantum computing, because 
of the theorem that an arbitrary quantum state cannot be cloned (duplicated) 



A quantum circuit is similarly built out of logical quantum wires carrying qubits, 
and quantum gates acting on these qubits. Each wire corresponds to one of the n 
qubits. We assume each gate acts on either one or two wires. The possible physical 
transformations of a quantum system are unitary transformations, so each quantum 
gate can be described by a unitary matrix. A quantum gate on one qubit is then 
described by a 2 x 2 matrix, and a quantum gate on two qubits by a 4 x 4 matrix. 
Note that since unitary matrices are invertible, the computation is reversible; thus 
starting with the output and working backwards one obtains the input. Further 
note that for quantum gates, the dimension of the output space is equal to that of 
the input space, so at all times during the computation we have n qubits carried 
on n quantum wires. 

It should be noted that these requirements of unitary and of maintaining only the 
original n qubits at all times need to be revised for dealing with noisy gates, an area 
not covered in this paper. In fact, it can be shown that with these requirements, 
noisy unitary gates make it impossible to carry out long computations 0; some 
means of eliminating noise by resetting qubits to values near is required. 

Quantum gates acting on one or two qubits (C 2 or C 4 ) naturally induce a trans- 
formation on the state space of the entire quantum computer (C 2 ). For example, 
if A is a 4 x 4 matrix acting on qubits i and j, the induced action on a basis vector 



m . 



of C 2 



is 



l 



l 



(2.1) 




s=0 t=0 
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This is the tensor product of A (acting on qubits i and j) with n— 2 identity matrices 
(acting on each of the remaining qubits). When we multiply a general vector by 
a quantum gate, it can have negative and positive coefficients which cancel out, 
leading to quantum interference. 

As there are for classical circuits, there are universal sets of gates for quantum 
circuits; such a universal set of gates is sufficient to build circuits for any quantum 
computation. One particularly useful universal set of gates is the set of all one-bit 
gates and a specific two-bit gate called the Controlled NOT (CNOT). These gates 
can efficiently simulate any quantum circuits whose gates act on only a constant 
number of qubits Q. On basis vectors, the CNOT gate negates the second (target) 
qubit if and only if the first (control) qubit is 1. In other words, it takes Vxy to 
Vxz where Z = X + Y (mod 2). This corresponds to the unitary matrix 

/ 1 \ 

10 

1 
\ 1 / 

Note that the CNOT is a classical reversible gate. To obtain a universal set of 
classical reversible gates, you need at least one reversible three-bit gate, such as 
a Toffoli gate; otherwise you can only perform linear Boolean computations. A 
Toffoli gate is a doubly controlled NOT, which negates the 3rd bit if and only if 
the first two are both 1. By itself the Toffoli gate is universal for reversible classical 
computation, as it can simulate both AND and NOT gates Thus, if you can 
make a Toffoli gate, you can perform any reversible classical computation. Further, 
as long as the input is not erased, any classical computation can be efficiently 
performed reversibly 0], and thus implemented efficiently by Toffoli gates. The 
matrix corresponding to a Toffoli gate is 



(2.2) 



/ 1 




















\ 





1 


























1 


























1 


























1 


























1 





























1 


V o 

















1 


o / 



We now define the complexity class BQP, which stands for bounded-error quan- 
tum polynomial time. This is the class of languages which can be computed on 
a quantum computer in polynomial time, with the computer giving the correct 
answer at least 2/3 of the time. 

To give a rigorous definition of this complexity class using quantum circuits, 
we need to impose uniformity conditions. Any specific quantum circuit can only 
compute a function whose domain (input) is binary strings of a specific length. 
To use the quantum circuit model to implement functions taking arbitrary length 
binary strings for input, we need a family of quantum circuits, that contains one 
circuit for inputs of each length. Without any further conditions on this family of 
circuits, the designer of this circuit family could hide an uncomputable function in 
the design of the circuits for each input length. This definition would thus result in 
the unfortunate inclusion of uncomputable functions in the complexity class BQP. 
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One should note that there is a name for this nonuniform class of functions. It is 
called BQP/poly, meaning that there can be at most a polynomial amount of extra 
information included in the circuit design. 

To exclude this possibility of including non-computable information in the cir- 
cuit, we require uniformity conditions on the circuit family. The easiest way of 
doing this is to require a classical Turing machine that on input n outputs a de- 
scription of the circuit for length n inputs, and which runs in time polynomial 
in n. For quantum computing, we need an additional uniformity condition on the 
circuits. It is also be possible for the circuit designer to hide uncomputable (or 
hard-to-compute) information in the unitary matrices corresponding to quantum 
gates. We thus require that the fc'th digit of the entries of these matrices can be 
computed by a second Turing machine in time polynomial in k and n. Although we 
do not have space to discuss this fully, the power of the classical machines designing 
the circuit family can actually be varied over a wide range; they can be varied from 
classes much smaller than P to the classical randomized class BPP. This helps us 
convince ourselves that we have the right definition of BQP. 

The definition of polynomial time computable functions on a quantum com- 
puter is thus those functions computable by a uniform family of circuits whose size 
(number of gates) is polynomial in the length of the input, and which for any input 
gives the right answer at least 2/3 of the time. The corresponding set of languages 
(languages are functions with values in {0, 1}) is called BQP. 

3. Relation of the Model to Quantum Physics 

The quantum circuit model of the previous section is much simplified from the 
realities of quantum physics. There are operations possible in physical quantum 
systems which do not correspond to any simple operation allowable in the quantum 
circuit model, and complexities that occur when performing experiments that are 
not reflected in the quantum circuit model. This section contains a brief discussion 
of these issues, some of which are discussed more thoroughly in || |l8||. 

In everyday life, objects behave very classically, and on large scales we do not see 
any quantum mechanical behavior. This is due to a phenomenon called decoherence, 
which makes superpositions of states decay, and makes large-scale superpositions of 
states decay very quickly. A thorough, elementary, discussion of decoherence can be 
found in p7| ; one reason it occurs is that we are dealing with open systems rather 
than closed ones. Although closed systems quantum mechanically undergo unitary 
evolution, open systems need not. They are subsystems of systems undergoing 
unitary evolution, and the process of taking subsystems does not preserve unitarity. 

However hard we may try to isolate quantum computers from the environment, 
it is virtually inevitable that they will still undergo some decoherence and errors. 
We need to know that these processes do not fundamentally change their behavior. 
Using no error correction, if each gate results in an amount of decoherence and 
error of order then 0{t) operations can be performed before the quantum state 
becomes so noisy as to usually give the wrong answer [||. Active error correction 
can improve this situation substantially; this is discussed in Gottesman's notes for 
this course Q. 

In some proposed physical architectures for quantum computers, there are re- 
strictions that are more severe than the quantum circuit model given in the preced- 
ing section. Many of these restrictions do not change the class BQP. For example, 
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it might be the case that a gate could only be applied to a pair of adjacent qubits. 
We can still operate on a pair of arbitrary qubits: by repeatedly exchanging one 
of these qubits with a neighbor we can bring this pair together. If there are n 
qubits in the computer, this can only increase the computation time by a factor of 
n, preserving the complexity class BQP. 

The quantum circuit model described in the previous section postpones all mea- 
surements to the end, and assumes that we are not allowed to use probabilistic steps. 
Both of these possibilities are allowed in general by quantum mechanics, but neither 
possibility makes the complexity class BQP larger ||. For fault-tolerant quantum 
computing, however, it is very useful to permit measurements in the middle of the 
computation, in order to measure and correct errors. 

The quantum circuit model also assumes that we only operate on a constant 
number of qubits at a time. In general quantum systems, all the qubits evolve simul- 
taneously according to some Hamiltonian describing the system. This simultaneous 
evolution of many qubits cannot be described by a single gate in our model, which 
only operates on two qubits at once. In a realistic model of quantum computation, 
however, we cannot allow general Hamiltonians, since they are not experimentally 
realizable. Some Hamiltonians that act on all the qubits at once are experimentally 
realizable. It would be nice to know that even though these Hamiltonians cannot 
be directly described by our model, they cannot be used to compute functions not 
in BQP in polynomial time. This could be accomplished by showing that systems 
with such Hamiltonians can be efficiently simulated by a quantum computer. Some 
work has been done on simulating Hamiltonians on quantum computers jl], [l5| , 
but I do not believe this question has been completely addressed yet. 



4. Simon's Algorithm 

In this section, we give Dan Simon's algorithm [[39) for a problem that takes ex- 
ponential time on a classical computer, but quadratic time on a quantum computer. 
This is an "oracle" problem, in that there is a function / given as a "black box" 
subroutine, and the computer is allowed to compute /, but is not allowed to look 
at the code for /. In fact, to prove the lower bound on a classical computer, we 
must permit the computer to use functions / which are not efficiently computable. 

We now describe Simon's problem. The computer is given a function / mapping 
to Fj which has the property that there is a c such that 

(4.1) f(x) = f(y)^x = y + c (mod F") 

Here, the addition is bitwise binary addition. Essentially, this is a function which 
is periodic over F^ with period c. 

We now describe the lower bound for a classical computer. Suppose that the 



function / is chosen at random from all functions with property (4.1). We show 
that you need to compute 0(2"/ 2 ) function evaluations to find c. Suppose that 
you have evaluated s values of /. You have then eliminated at most one value of 
c for each pair of the s values of / computed, but c is equally likely to be any 
of the remaining possibilities. Thus, after computing s values of /, you will have 
eliminated at most s(s — l)/2 values of c. At least half the time, you must try more 
than half the possibilities for c, and this takes 0(2 n / 2 ) function evaluations. 
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We now describe Simon's algorithm for finding the period on a quantum com- 
puter. To do this, we need to introduce the Hadamard gate, 

1/1 1 



H V2V1 -i 

Now, suppose that we apply the Hadamard transformation to each of k qubits. We 
obtain, for a vector a in F§, 

2^—1 

(4.2) ^ fe (K) = ^£(-ir<v b . 

It is easy to see that each entry of the matrix H® k is ±2~ k l 2 . Further, the (a, b) 
entry picks up a factor of — 1 for each position which is 1 in both a and b, giving a 
sign of (— l) a ' b . Here, 



b = ^ ciibi (mod 2) 



is the binary inner product of a and b. This is in fact the Fourier transform over 



F h 2 . 



We are now ready to describe Simon's algorithm. We will use two registers, both 
with n qubits. We start with the state Vq®Vq. The first step is to take each qubit in 
the first register to ^75 (K) + Vi), putting the first register in an equal superposition 
o all binary strings of length n. The computer is now in the state 



2™-l 



2 -n/2 J- V x ® V . 
x=0 

The second step is to compute f(x) in the second register. We now obtain the state 

2™-l 

2""/ 2 Y, V x ®V f(x) . 



x=0 



Note that since the input x of the function f(x) is kept in memory, this is a reversible 
classical transformation, and thus unitary. The third step is to take the Fourier 
transform of the first register. This leaves the first register in the state 



]r Y{-ir v v y ®v f[x) . 

x =0 y=0 



Finally, we observe the state of the computer in the basis Vi ® Vj. We see the 
state V y ® Vf( x ) with probability equal to the square of its amplitude in the above 
sum. There are exactly two x which give the value f(x), namely x and x + c. The 
probability of observing V y (g) Vft x \ is thus 



2 -2n 



({— l) X ' V + ( — l^+c)'^ 2 . 



This probability is either 2 2 ™~ 2 or 0, depending on whether y ■ c is or 1. The 
above measurement thus produces a random y with c • y = 0. It is straightforward 
to show that 0(n) such y's chosen at random will be of full rank in c 1 , the n — 1 
dimensional space perpendicular to c, and thus determine c uniquely. Thus, if we 
repeat the above procedure 0(n) times, we will be able to deduce c. Since each of 
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these repetitions takes 0(n) steps on the quantum computer, we obtain the answer 
in 0(n 2 + nF) time, where F is the cost of the evaluating the function /. 

Simon's algorithm is at least a moderately convincing argument that BQP is 
strictly larger than BPP, although it is not a rigorous proof. However, Simon's 
problem is contrived in that it does not seem to have arisen in any other context. 
It did point the way to my discovery of the factoring algorithm, which will be 
discussed in the next section. The factoring algorithm is a much less convincing 
argument that BQP is larger than BPP, as nobody really knows the complexity of 
factoring. However, as factoring is a widely studied problem that is fundamental for 
public key cryptography |35| , the quantum factoring algorithm brought widespread 
attention to the field of quantum computing. 

5. The Factoring Algorithm 

For factoring an L-bit number N , the best classical algorithm known is the num- 
ber field sieve |2^]; this algorithm asymptotically takes time 0(exp(cL 1 / 3 log 2 ^ 3 L)). 
On a quantum computer, the quantum factoring algorithm takes asymptotically 
0(-L 2 logLloglogL) steps. The key idea of the quantum factoring algorithm is the 
use of a Fourier transform to find the period of the sequence Ui = x l (mod N ) , from 
which period a factorization of N can be obtained. The period of this sequence 
is exponential in L, so this approach is not practical on a digital computer. On a 
quantum computer, however, we can find the period in polynomial time by exploit- 
ing the 2 2i -dimensional state space of 2L qubits, and taking a Fourier transform 
over this space. The exponential dimensionality of this space permits us to take 
the Fourier transform of an exponential length sequence. How this works will be 
made clearer by the following sketch of the algorithm, the full details of which are 
in along with a quantum algorithm for finding discrete logarithms. 

The idea behind all the fast factoring algorithms (classical or quantum) is fairly 
simple. To factor N, find two residues mod N such that 

(5.1) s 2 = t 2 (mod N) 
but s ^ ±t (mod N). We now have 

(5.2) (s + t)(s-t) = (mod N) 

and neither of these two factors is (mod N). Thus, s + t must contain one factor of 
N (and s — t another). We can extract this factor by finding the greatest common 
divisor of s + t and N; this computation can be done in polynomial time using 
Euclid's algorithm. 

In the quantum factoring algorithm, we find the multiplicative period r of a 
residue x (mod N). This period r satisfies x r = 1 (mod N). If we are lucky and 
r is even, then both sides of this congruence are squares and we can try the above 
factorization method. If we are just a little bit more lucky, thenW 2 ^ -1 (mod N), 
and we obtain a factor by computing gcd(:W 2 + l,iV). The greatest common 
divisor can be computed in polynomial time on a classical computer using Euclid's 
algorithm. 

It is a relatively simple exercise in number theory to show that for large N 
with two or more prime factors, at least half the residues x (mod N) produce 
prime factors using this technique, and that for most large N the fraction of good 
residues x is much higher; thus, if we try several different values for x, we have to 
be particularly unlucky not to obtain a factorization using this method. 
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We now need to explain what the quantum Fourier transform is. The quantum 
Fourier transform on k qubits maps the state V a , where a is considered as an integer 
between and 2 fe — 1, to a superposition of the states Vb as follows: 

2 k -l 

(5.3) V a -> w exp(2^a6/2 fc ) V b 

b=0 

It is easy to check that this transformation defines a unitary matrix. It is not 
as straightforward to implement this Fourier transform as a sequence of one- and 
two-bit quantum gates. However, an adaption of the Cooley-Tukey algorithm de- 
composes this transformation into a sequence of k(k — l)/2 one- and two-bit gates. 
More generally, the discrete Fourier transform over any product Q of small primes 
(each of size at most log Q) can be performed in polynomial time on a quantum 



computer. We will show how to break the above Fourier transform of Eq. (5.3) into 
this product of two-bit gates at the end of this section. 

We are now ready to give the quantum algorithm for factoring. What we do 
is design a polynomial-size circuit which starts in the quantum state V00...0 an d 
whose output, with reasonable probability, lets us factor an L-bit number N in 
polynomial time using a digital computer. This circuit has two main registers, the 
first of which is composed of 2L qubits and the second of L qubits. It also requires 
a few extra qubits of work space, which we do not mention in the summary below 



but which are required for implementing the step (5.5) below. 

We start by putting the computer into the state representing the superposition 
of all possible values of the first register: 

2 2L -1 

(5-4) ^ J2 V "® V °- 

This can easily be done using 2L gates by putting each of the qubits in the first 
register into the state ^(Vo + Vi). 

We next use the value of a in the first register to compute the value x a (mod N) 
in the second register. This can be done using a reversible classical circuit for 
computing x a (mod N) from a. Computing x a (mod N) using repeated squaring 
takes 0(L 3 ) quantum gates using the grade school multiplication algorithm, and 
asymptotically 0(1? log L log log L) gates using fast integer multiplication (which 
is actually faster only for moderately large values of L) . This leaves the computer 
in the state 



2 2i -l 

(5.5) -z ]T V a <8>V x . 



2 L 

a=0 



(mod N)- 



The next step is to take the discrete Fourier transform of the first register, as in 
Equation (|7|). This puts the computer into the state 



2 2L ~1 2 2L -1 

( 5 - 6 ) J2 J2 exp(27Tiab/2 2L )V c <E,V xa(mod N) . 

a=0 c=0 

Finally, we measure the state of our machine. This yields the output V c 
Kc3(mod A O w ith probability equal to the square of the coefficient on this state in 



the sum (5.6). Since many values of x a (mod N) are equal, many terms in this sum 
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contribute to each coefficient. All these a's giving the same value of x a (mod N) 
can be represented as 

a = clq + br, 

where ao is the smallest of these a's and 6 is some integer between and \2 2L /r~\. 
Explicitly, this probability is: 

2 



(5.7) 



2 4L 



|2-"7rJ+17 

exp(2nia c/2 2L ) exp(27ri6rc/2 2i ) 

6=0 



where rj is either or 1, depending on the values of 2 (mod r) and Oq. This sum 



in Eq. (5.7) is a geometric sum of unit complex numbers equally spaced around 



the unit circle, and it is straightforward to check that this sum is small except 
when these complex numbers point predominantly in the same direction. For this 
to happen, we need that the angle between the two complex phases for b and b + 1 
is on the order of the reciprocal of the number of possible 6's, i.e., that 

(5.8) rc/2 2L = d + 0(r/2 2L ) 



for some integer d. We thus are likely to observe only values of b satisfying (5 
Recalling that 2 2L w iV 2 , we can rewrite this equation to obtain 

(5.9) JL = d + 0(i/Ar 



We know c and 2 2i . 
the 0(l/iV 2 ) in Eq. 



2 2 - r 

and we want to find r. Since both d and r are less than N, if 
|5.9|) were exactly 1/2N 2 , we would have 



c 

$2L 



< 



1 



2iV 2 



and - would be the closest fraction to c/2 2L with numerator and denominator less 

r I 

than N. In actuality, it is likely to be one of the closest ones. Thus, all we need do 
to find r is to round c/2 2L to find all close fractions with denominators less than 
AT. This can be done in polynomial time using a continued fraction expansion, and 
since we can check whether we have obtained the right value of r, we can search 
the close fractions until we have obtained the correct one. We chose 2L as the size 
of the first register in order to make d/r likely to be the closest fraction to c/2 2L 
with numerator and denominator at most N. 

More details of this algorithm can be found in |56|. Recently, Zalka jl6| has 
analyzed the resources required by this algorithm much more thoroughly, improving 
upon their original values in many respects. For example, he shows that you can use 
only 3L + o(L) qubits, whereas the original algorithm required 2L extra qubits for 
workspace, giving a total of 5L qubits. He also shows how to efficiently parallelize 
the algorithm to run on a parallel quantum computer. 

5.1. Implementing the Quantum Fou rier Transform. We now show how to 
break the discrete Fourier transform (Eq. |5.3|) into a product of two-bit gates, a 
step which we previously postponed to this subsection. Let us consider the Fourier 
transform on k + 1 bits. 



(5.10) 



V a 



1 



2 (fe+i)/2 



2 k+i_ 

E 

6=0 



exp(2Triab/2 k+1 )V b 
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We will assume that we have an expression for the Fourier transform on fc qubits, 
and show how to obtain an expression for the Fourier transform on fc + 1 qubits 
using only k + 1 additional gates. 

We break the input space V a on k + 1 qubits into the tensor product of a fc-qubit 
space and a 1-qubit space, so that V a = V a _ <g) V ao , where the (k + l)-bit string a is 
the concatenation of the fc-bit string a_ and the one-bit string do- Thus, ao is the 
rightmost bit of the binary number a, i.e., the units bit. We similarly break the 
output space V& into the tensor product of a 1-qubit space and a fc-qubit space, but 
this time we choose the first bit as the 1-qubit space, so V& = V\, h <g> V&_ , where bk 
is the leftmost bit of b, i.e. the bit with value 2 fc , and b- comprises the k rightmost 
bits. Now, the Fourier transform becomes 

1 2 fc -l 

(5.11) V a _V ao - 2^ ]T ]T CX P (Mht* + +"- b * + t)) Vb h V b _. 

a k =0 a_=0 
b 0=° b_=0 

We now analyze this expression. First, the term exp(2-7ria^6fe) is always 1, and thus 
can be dropped. The term exp(27ria_6_/2 fc ) is the phase factor in the quantum 
Fourier transform on fc qubits. Thus, if we first perform the Fourier transform on 
fc qubits (which we can do by the induction hypothesis), we take V a _ to V b _ and 
obtain this phase factor. The term exp(27riao&-/2 fe+1 ) can be expressed as the 
product of fc gates, by letting the gate 



( i o i) o ^ 

1 

1 

V exp(^ 7 ) J 



operate on the qubits corresponding to ao and bj , by which we mean the bit of 6_ 
with value 2 J , i.e., the j + l'st bit from the right. This gate applies the phase factor 
of exp(27ri/2 fe+1_ - ? ) if and only if both the bits ao and bj are 1. Finally, the term 

exp(27rm 6 fc /2) = (-l) a °-b fc 

is the unitary gate 

W i i 



which takes V ao to Vb k with the phase factor (— l) ao ' 6fc . We now see that we can 
obtain the Fourier transform on fc + 1 qubits by first applying the Fourier transform 
on fc qubits, taking V a _ to ^2 exp(27ria_6_/2 fc )V{,^ , next applying the gate Tj^ on 
the qubits V ao and V\ >j for j = to k — 1, and finally by applying the gate H on 
the qubit V ao (yielding in the qubit For those readers who are familiar with 
the Cooley-Tukey fast Fourier transform, this is almost a direct translation of it to 
a quantum algorithm. Multiplying the gates T^ k for a fixed k gives the "twiddle 
factor" of the Cooley-Tukey FFT. 

One objection that might be raised to this expansion of the Fourier transform 
is that it requires gates with exponentially small phases, which could not possibly 
be implemented with any physical accuracy. In fact, one can omit these gates 
and obtain an approximate Fourier transform which is close enough to the actual 
Fourier transform that it barely changes the probability that the factoring algorithm 
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succeeds [[L4| . This reduces the number of gates required for the quantum Fourier 
transform from 0(k 2 ) to O(fclogfc). 

6. Grover's Algorithm 

Another very important algorithm in quantum computing is L. K. Grover's 
search algorithm, which searches an unordered list of N items (or the range of 
an efficiently computable function) for a specific item in time 0(y/~N), an improve- 
ment on the optimal classical algorithm, which must look at N/2 items on average 
before finding a specific item fl25|| . The technique used in this algorithm can be 
applied to a number of other problems to also obtain a square root speed-up p6f . 
If you are searching an unordered database, this square root speed-up is as good 
as a quantum computer can do; this is proved using techniques developed in 0. 
Finally, a generalization of both Grover's search algorithm and the lower bound 
above gives tight bounds on how much a quantum computer can amplify a quan- 
tum procedure that has a given probability of success [ jlO| . The quantum search 
algorithm can be thought of in these terms; the procedure is just that of choosing 
a random element of the TV-element list, so the probability of success is 1/N. A 
quantum computer can amplify this probability to near-unity by using 0(\fN) it- 
erations while a classical computer requires order N iterations. I sketch Grover's 
search algorithm below. 

Grover's algorithm uses only three transformations. The first is the transforma- 
tion W = H® k , which is the transformation obtained by applying the matrix 

-Ml 

to each qubit. It is easy to check that W 2 — Id, because H 2 — Id. The second 
transformation is Zo, which takes the basis vector Vq to — Vq and leaves Vi un- 
changed for i 7^ 0. The third is Z t , which takes Vt to — V* and leaves Vi unchanged 
for i t, where the t'th element of the list is the one we are trying to find. At 
first glance, it might seem that we need to know t to apply Z t ; however, if we can 
design a quantum circuit that tests whether an integer i is equal to t, than we can 
use it to perform the transformation Z t . For example, if we are searching for a 
specific element in an unordered list, it is fairly straightforward to write a program 
that tests whether the i'th element of the list is indeed the desired element, and 
negates the phase if it is, without knowing where the desired element is in the list. 
Similarly, if we are searching for a solution to some mathematical problem, we need 
only to be able to efficiently test whether a given integer i encodes a solution to 
the problem. 

Suppose that we are searching among N = 2 k items, which are encoded by the 
integers to N — 1. Here we use k qubits to keep track of the items. We will now 
calculate that if we start in the superposition 

V-liV-l T 7 

S. i=0 a i V i 

then the transformation ~WZ$W leaves us in the state 

^ =0 1 (2m-a t )V l 

where m = ■^■S^ r_1 a i is the mean of all the amplitudes. The proof of this follows 
from the observation that after the transform W, the amplitude of Vq is ^/Nm. 
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Recall that W 2 = Id. These two observations can be used to show that the trans- 
formation WZqW extracts the mean m in the amplitude of Vq, negates it, and 
redistributes it negated over all the basis states V%. The transformation WZqW 
thus takes Y^iatVi to Ej(o;, — 2m) V^. 

We are now in a position to describe Grover's algorithm in detail. We start 
in the equal superposition of all Vi, i.e. the state -^E^, 1 !/;. We then repeat 

the transformation Z t WZoW for cV~N iterations, for the appropriately chosen con- 
stant c. What this accomplishes is to gradually increase the amplitude on Vt at 
the expense of all the other amplitudes, until after cy/N iterations the amplitude 
on Vt is nearly unity. Suppose that we have reached a point where the amplitude 
on V; is a for all i ^ t and (3 for Vt. It is easy to see that in the next step, these 
amplitudes are 2m — a and (3 + 2m, respectively, where m = {j3 + (N — l)a)/N is 
the mean amplitude. When (3 is small, m ~ a w l/\/N, and thus the amplitudes 
on Vi, i ^ t decrease slightly and the amplitude on Vt increases by approximately 
2/\/N. I will not go into the details in this write-up, but this at least gives the 
intuition that, after c\J~N steps, we obtain a state very close to Vt- There are many 
variations of this algorithm, including ones that work when there is more than one 
desired solution. For more details, I recommend reading Grover's paper ]2q] . 

Finally, as Feynman suggested, it appears that quantum computing is good at 
computing simulations of quantum mechanical dynamics. I will not be discussing 
this. Some work in this regard has appeared in jl], |2^, flsj ], but much remains to be 
done. 
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